Optimization of Collective Reduction Operations
نویسنده
چکیده
A 5-year-profiling in production mode at the University of Stuttgart has shown that more than 40% of the execution time of Message Passing Interface (MPI) routines is spent in the collective communication routines MPI Allreduce and MPI Reduce. Although MPI implementations are now available for about 10 years and all vendors are committed to this Message Passing Interface standard, the vendors’ and publicly available reduction algorithms could be accelerated with new algorithms by a factor between 3 (IBM, sum) and 100 (Cray T3E, maxloc) for long vectors. This paper presents five algorithms optimized for different choices of vector size and number of processes. The focus is on bandwidth dominated protocols for power-of-two and non-power-of-two number of processes, optimizing the load balance in communication and computation.
منابع مشابه
Optimization Rules for Programming with Collective Operations
We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a formal framework for reasoning about collective operations; (2) a set of optimization rules which save communications by fusing several collective operations into one; (3) performance estimates, which guide the applicati...
متن کاملThe Case for Collective Pattern Specification
Many scientific applications are written in a Bulk Synchronous Parallel style, in which regions of pure computation are separated by communication operations. Unless an existing MPI collective operation can be used, these communication operations are usually written as separate message sends and receives, making analysis and optimization difficult. This style of communication also reduces reada...
متن کاملScheduling Post-Distribution Cross-Dock under Demand Uncertainty
The system of distribution of goods and services, along with other economic developments around the world, is rapidly evolving. In the world of distribution of goods, the main focus is on making distribution operations more effective. Due to the fact that the cross-dock has the advantage of removing intermediaries and reducing the space required for the warehouse, it is worth considering. Among...
متن کاملOptimization of Collective Communication Operations in MPICH
We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Messa...
متن کاملPipelining and Overlapping for MPI Collective Operations
Collective operations are an important aspect of the currently most important message-passing programming model MPI (Message Passing Interface). Many MPI applications make heavy use of collective operations. Collective operations involve the active participation of a known group of processes and are usually implemented on top of MPI point-to-point message passing. Many optimizations of the used...
متن کاملFlexible Collective Operations for Distributed Object Groups
Collective operations on multiple distributed objects are a powerful means to coordinate parallel computations. In this paper we present an inheritance based approach to implement parallel collective operations on distributed object groups. Object groups are described as reusable application-speci c classes that coordinate both operation propagation to group members as well as the global collec...
متن کامل